Introduction

Analyzing the Life Expectancy and GDP in Cuba, Puerto Rico, and Morocco

This analysis explores the relationship between life expectancy and GDP per capita in three distinct regions: Cuba, Puerto Rico, and Morocco, using the gapminder dataset. Through a series of interactive scatterplots, this project visualizes how economic and health indicators evolve over time for each country.

The dataset highlights key trends: while Morocco has the lowest life expectancy and largest population, Puerto Rico boasts the highest life expectancy and GDP per capita. Cuba falls in between, with a stable GDP range but without significant correlation to life expectancy. The project offers a deep dive into these dynamics, showcasing how life expectancy increases correlate differently with GDP across these countries.


Data

Load Libraries and Packages

# Load required packages
if (!require("mosaic"))
  install.packages("mosaic")
if (!require("tidyverse"))
  install.packages("tidyverse")
if (!require("ggplot2"))
  install.packages("ggplot2")
if (!require("gganimate"))
  install.packages("gganimate")
if (!require("gifski"))
  install.packages("gifski")
if (!require("gapminder"))
  install.packages("gapminder")
if (!require("dplyr"))
  install.packages("dplyr")
if (!require("plotly"))
  install.packages("plotly")

library(mosaic) # Stats analysis
library(tidyverse) # Data packages
library(ggplot2) # Data visualization
library(gganimate) # Data visualization capabilities
library(gifski) # Render plots by gganimate
library(gapminder) # Data set 
library(dplyr) # Data manipulation
library(plotly) # Interactive plots

Description of Data

The Gapminder dataset offers global data on life expectancy, GDP per capita, and population across different countries over time. It includes records for countries like Cuba, Puerto Rico, and Morocco, capturing shifts in economic and health indicators. I chose this dataset because it allows for a deep dive into how life expectancy correlates with GDP in diverse regions, revealing trends that reflect broader socio-economic patterns and development.

Data set: gapminder

Load and Clean Data

# Load data
data <- gapminder

# Dataset dimensions
dim(data)
## [1] 1704    6
# Check and remove any missing values
sum(is.na(data))
## [1] 0

Preview & Filter Data

# Top 10 rows of data unfiltered
head(data, 10)
country continent year lifeExp pop gdpPercap
Afghanistan Asia 1952 28.801 8425333 779.4453
Afghanistan Asia 1957 30.332 9240934 820.8530
Afghanistan Asia 1962 31.997 10267083 853.1007
Afghanistan Asia 1967 34.020 11537966 836.1971
Afghanistan Asia 1972 36.088 13079460 739.9811
Afghanistan Asia 1977 38.438 14880372 786.1134
Afghanistan Asia 1982 39.854 12881816 978.0114
Afghanistan Asia 1987 40.822 13867957 852.3959
Afghanistan Asia 1992 41.674 16317921 649.3414
Afghanistan Asia 1997 41.763 22227415 635.3414
# Filter gapminder to include only cuba, puerto rico, and morocco
filtered_data <- data %>%
  filter(country %in% c("Cuba", "Puerto Rico", "Morocco"))

# View filtered data
head(filtered_data)
country continent year lifeExp pop gdpPercap
Cuba Americas 1952 59.421 6007797 5586.539
Cuba Americas 1957 62.325 6640752 6092.174
Cuba Americas 1962 65.246 7254373 5180.756
Cuba Americas 1967 68.290 8139332 5690.268
Cuba Americas 1972 70.723 8831348 5305.445
Cuba Americas 1977 72.649 9537988 6380.495

Variables

# Variables
names(data)
## [1] "country"   "continent" "year"      "lifeExp"   "pop"       "gdpPercap"

The following variables are featured in this data visualization project:

  1. country
    A qualitative variable representing the country or region.

  2. continent
    A quantitative variable representing the continent where the country is located.

  3. year
    A quantitative variable representing the year of the data point.

  4. lifeExp
    A quantitative variable representing the life expectancy at birth, measured in years.

  5. pop
    A quantitative variable representing the total population of the country.

  6. gdpPercap
    A quantitative variable representing the per-capita GDP of the country, measured in US dollars.


Data Analysis

# Interactive illustration; continuously looping
inter_plot_loop <- ggplot(filtered_data, aes(x = lifeExp, y = gdpPercap, color = country, size = pop)) +
  # Scatterplot
  geom_point() +
  # Adding labels
  labs(title = "Life Expectancy vs GDP Per Capita by Country",
       x = "Life Expectancy",
       y = "GDP per Capita",
       size = "Population",
       color = "Country") +
  # Set animation to the year
  transition_time(year) +
  # Set the duration 
  ease_aes('linear') +
  # Add theme
  theme_minimal()

# Save as a GIF file and continuously loop
anim_save("life_expectancy_vs_gdp_loop.gif", inter_plot_loop)
# Print interactive illustration
inter_plot_loop

The interactive illustration below shows the change over time that includes a user interface slider that allows the you to adjust the year of your choosing.

# Interactive illustration; including a user interface slider
inter_plot_slider <- plot_ly(
  data = filtered_data,
  x = ~lifeExp,
  y = ~gdpPercap,
  color = ~country,
  size = ~pop,
  text = ~paste("Country:", country, "<br>Year:", year, "<br>Population:", pop),
  frame = ~year  # Use year variable for frames
  ) %>%
  # Add markers
  add_markers() %>%
  # Add Labels 
  layout(
    title = "Life Expectancy vs GDP per Capita by Country",
    xaxis = list(title = "Life Expectancy"),
    yaxis = list(title = "GDP per Capita")
  ) %>%
  # Add speed for illustration
  animation_opts() %>%
  # Add slider by year 
  animation_slider(currentvalue = list(prefix = "Year: "))

# Print interactive plot
inter_plot_slider

The visualizations above display the relationship between life expectancy and GDP per capita for Cuba, Puerto Rico, and Morocco. Each country is represented by a different color, and the size of the points corresponds to the population of each country in a given year. From these plots, several key observations can be made:

  • Morocco has the lowest life expectancy, while Puerto Rico has the highest life expectancy

  • Morocco has the largest population among the three countries

  • Puerto Rico has the highest GDP per capita

  • Morocco consistently kept its GDP per capita below 5,000, with a slow increase in GDP as life expectancy rises

  • Cuba maintained a GDP per capita between 5,000 and 10,000, with little to no significant increase in GDP as life expectancy grew

  • Puerto Rico experienced steady GDP growth, with a clear correlation between rising life expectancy and increasing GDP per capita


Tableau Story

Tableau Dashboard: Nicole Rodriguez - A Comparative Analysis of Cuba, Puerto Rico, and Morocco